{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RIFRAF clonal accuracy\n",
    "\n",
    "This is a [Jupyter](http://jupyter.org/) notebook that demonstrates how to use RIFRAF on clonal sequence data. This notebook is a work in progress.\n",
    "\n",
    "The method is implemented as a Julia package: [Rifraf.jl](https://github.com/MurrellGroup/Rifraf.jl). The data is available on FigShare: [https://doi.org/10.6084/m9.figshare.5643247](https://doi.org/10.6084/m9.figshare.5643247).\n",
    "\n",
    "A full description of the method has been submitted for publication. A link to the journal article will be added here when it is available."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    " \n",
    "Before running for the first time, do the following:\n",
    "\n",
    "- Install `Rifraf.jl`.\n",
    "- Download and unzip the data from FigShare (or supply your own data)\n",
    "- Update the paths to the data in the code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using Rifraf\n",
    "\n",
    "# update these paths\n",
    "READ_FILE = \"../nl43_data/C01.NL43.fastq\";\n",
    "REFERENCE_FILE = \"../nl43_data/references.fasta\";\n",
    "\n",
    "# set these parameters\n",
    "const ERROR_RANGE = (1e-3, 1e-2)\n",
    "const PHRED_CAP = Int8(30)\n",
    "const LENGTH_CUTOFF = 25\n",
    "\n",
    "const REF_SCORES = Scores(ErrorModel(9.0, 1e-5, 1e-5, 0.5, 0.5))\n",
    "const SEQ_SCORES = Scores(ErrorModel(1.0, 5.0, 5.0, 0.0, 0.0));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Utility code\n",
    "\n",
    "The following code is used for reading and processing the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "include(\"./clonal_code.jl\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "converged: true\n",
      "execution time: 3.6 seconds\n",
      "found true sequence: true\n"
     ]
    }
   ],
   "source": [
    "function run_experiments()\n",
    "    # read data\n",
    "    references = Rifraf.read_fasta_records(REFERENCE_FILE)\n",
    "    sequences, phreds, names = Rifraf.read_fastq(READ_FILE, seqtype=DNASeq)\n",
    "    \n",
    "    # use first reference as both template as reference, just for demonstration purposes\n",
    "    refseq = DNASeq(sequence(references[1]))\n",
    "    template = DNASeq(sequence(references[1]))\n",
    "    \n",
    "    # sample a set of reads\n",
    "    mean_errors = Float64[mean(Rifraf.phred_to_p(phred)) for phred in phreds]\n",
    "    cand_indices = valid_read_indices(ERROR_RANGE, sequences, phreds, mean_errors; length_cutoff=LENGTH_CUTOFF)\n",
    "    sampled = sample_reads(10, cand_indices, sequences, phreds, names, mean_errors)\n",
    "    (sampled_sequences, sampled_phreds, sampled_names, sampled_indices) = sampled\n",
    "    \n",
    "    # trim ends against the first reference\n",
    "    trimmed_sequences, trimmed_phreds = trim_reads(sampled_sequences, sampled_phreds, refseq)\n",
    "    \n",
    "    # run RIFRAF with capped phreds and the first reference sequence\n",
    "    params = RifrafParams(scores=SEQ_SCORES, ref_scores=REF_SCORES)\n",
    "    capped_phreds = Vector{Int8}[Rifraf.cap_phreds(p, PHRED_CAP) for p in trimmed_phreds]\n",
    "\n",
    "    tic()\n",
    "    result = rifraf(trimmed_sequences, capped_phreds; reference=refseq, params=params)\n",
    "    elapsed = toq()\n",
    "    \n",
    "    println(\"converged: $(result.state.converged)\")\n",
    "    println(\"execution time: $(round(elapsed, 2)) seconds\")\n",
    "    println(\"found true sequence: $(result.consensus == template)\")\n",
    "end\n",
    "\n",
    "run_experiments()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.6.0",
   "language": "julia",
   "name": "julia-0.6"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
